You can find all related materials, links and references there.
December 16, 2019
You can find all related materials, links and references there.
\[\text{Missing data is ubiquitous.}\]
Ad hoc solutions may yield invalid inferences (Van Buuren 2018).
Rubin (1987) proposed the framework of MI.
\[\begin{align} B&=\frac{T}{M-1} \sum_{m=1}^{M}\left(\bar{\theta}^{(\cdot m)}-\bar{\theta}^{(\cdot \cdot)}\right)^{2}, \quad \text{where} \quad \bar{\theta}^{(\cdot m)}=\frac{1}{T} \sum_{t=1}^{T} \theta^{(t m)}, \\ &\quad \text{ and } \quad \bar{\theta}^{(\cdot \cdot)}=\frac{1}{M} \sum_{m=1}^{M} \bar{\theta}^{(\cdot m)} \\ W&=\frac{1}{M} \sum_{m=1}^{M} s_{j}^{2}, \quad \text{where} \quad s_{m}^{2}=\frac{1}{T-1} \sum_{t=1}^{T}\left(\theta^{(t m)}-\bar{\theta}^{(\cdot m)}\right)^{2} \\ \widehat{R}&=\sqrt{\frac{\widehat{\operatorname{var}}^{+}(\theta | y)}{W}}, \quad \text{where} \quad \widehat{\operatorname{var}}^{+}(\theta | y)=\frac{N-1}{N} W+\frac{1}{N} B. \end{align}\]
Steps in the simulation study
simulation <- function() {
mvtnorm(X,Z1,Z2) %>%
mutate(Y~X+Z1+Z2) %>%
for (max_iterations in 1:100) {
ampute() %>%
impute() %T%
convergence_diagnostics %>%
lm(Y~X+Z1+Z2) %>%
pool %>%
simulation_diagnostics %>%
c(., convergence_diagnostics)
}
}
replicate(simulation, n = 1000)
Rubin, Donald B. 1987. Multiple Imputation for Nonresponse in Surveys. Wiley Series in Probability and Mathematical Statistics Applied Probability and Statistics. New York, NY: Wiley.
Van Buuren, Stef. 2018. Flexible Imputation of Missing Data. Chapman; Hall/CRC.